-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Stop cross-validation early if the parameters have high predicted test loss #915
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. Since there is some very sophisticated code there, I included some comments to simplify code understanding.
Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>
Thanks for the review @valeriy42. I think I've now addressed all your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good job on explaining complex algorithmic bits.
lib/maths/CBoostedTreeFactory.cc
Outdated
@@ -46,7 +46,10 @@ const double MIN_DOWNSAMPLE_LINE_SEARCH_RANGE{2.0}; | |||
const double MAX_DOWNSAMPLE_LINE_SEARCH_RANGE{144.0}; | |||
const double MIN_DOWNSAMPLE_FACTOR_SCALE{0.3}; | |||
const double MAX_DOWNSAMPLE_FACTOR_SCALE{3.0}; | |||
const std::size_t MAX_NUMBER_FOLDS{5}; | |||
// This isn't a hard limit be we increase the number of default training folds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, something wrong with this sentence.
// So how does the following work: we'd like "c * f * # rows" training rows. | ||
// For k folds we'll have "(1 - 1 / k) * # rows" training rows. So we want | ||
// to find the smallest integer k s.t. c * f * # rows <= (1 - 1 / k) * # rows. | ||
// This gives k = ceil(1 / (1 - c * f)). However, we also upper bound this | ||
// by MAX_NUMBER_FOLDS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very nice explanation!
This makes two changes: